Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Text punctuation restoration for Vietnamese speech recognition with multimodal features
Hua LAI, Tong SUN, Wenjun WANG, Zhengtao YU, Shengxiang GAO, Ling DONG
Journal of Computer Applications    2024, 44 (2): 418-423.   DOI: 10.11772/j.issn.1001-9081.2023020231
Abstract109)   HTML10)    PDF (3010KB)(48)       Save

The text sequence output by the Vietnamese speech recognition system lacks punctuation, and punctuating the recognized text can help eliminate ambiguity and make it easier to understand. However, the punctuation restoration model based on text modality faces the problem of inaccurate punctuation prediction when dealing with noisy text, as errors in phonemes often occur in Vietnamese speech recognition systems, which can destroy the semantics of the text. A Vietnamese speech recognition text punctuation restoration method that utilizes multi-modal features was proposed, guided by intonation pauses and tone changes in Vietnamese speech to correctly predict punctuation for noisy text. Specifically, Mel-Frequency Cepstral Coefficients (MFCC) were used to extract speech features, pre-trained language models were used to extract text context features, and speech and text features were fused with label attention mechanism to fuse multi-modal features, thereby enhancing the model’s ability to learn contextual information from noisy Vietnamese text. Experimental results show that compared to punctuation restoration models that extract only text features based on Transformer and BERT (Bidirectional Encoder Representations from Transformers), the proposed method improves the precision, recall, and F1 score on Vietnamese dataset by at least 10 percent points, demonstrating the effectiveness of fusing speech and text features in improving punctuation prediction accuracy for noisy Vietnamese speech recognition text.

Table and Figures | Reference | Related Articles | Metrics
Neural machine translation integrating bidirectional-dependency self-attention mechanism
Zhijin LI, Hua LAI, Yonghua WEN, Shengxiang GAO
Journal of Computer Applications    2022, 42 (12): 3679-3685.   DOI: 10.11772/j.issn.1001-9081.2021101805
Abstract268)   HTML13)    PDF (961KB)(121)       Save

Aiming at the problem of resource scarcity in neural machine translation, a method for fusion of dependency syntactic knowledge based on a Bidirectional-Dependency self-attention mechanism (Bi-Dependency) was proposed. Firstly, an external parser was used to parse the source sentence to obtain dependency parsing data. Then, the dependency parsing data was transformed into the position vector of the parent word and the weight matrix of the child word. Finally, the dependency knowledge was integrated into the multi-head attention mechanism of the Transformer encoder. By using Bi-Dependency, the translation model was able to simultaneously pay attention to the dependency information in both directions: the parent word to the child word and the child word to the parent word. Experimental results of bi-directional translation show that compared with the Transformer model, in the case of rich resources, the proposed method has the BLEU (BiLingual Evaluation Understudy) value on Chinese-Thai translation improved by 1.07 and 0.86 respectively, and the BLEU value on Chinese-English translation improved by 0.79 and 0.68 respectively; in the case of low resources, the proposed model has the BLEU value increased by 0.51 and 1.06 respectively on Chinese-Thai translation, and the BLEU value increased by 1.04 and 0.40 respectively on Chinese-English translation. It can be seen that Bi-Dependency provides the model with richer dependence information, which can effectively improve the translation performance.

Table and Figures | Reference | Related Articles | Metrics
Event detection without trigger words incorporating syntactic information
Cui WANG, Yafei ZHANG, Junjun GUO, Shengxiang GAO, Zhengtao YU
Journal of Computer Applications    2021, 41 (12): 3534-3539.   DOI: 10.11772/j.issn.1001-9081.2021060928
Abstract246)   HTML6)    PDF (697KB)(101)       Save

Event Detection (ED) is one of the most important tasks in the field of information extraction, aiming to identify instances of specific event types in text. Existing ED methods usually use adjacency matrix to express syntactic dependencies, however, the adjacency matrix often needs to be encoded with Graph Convolutional Network (GCN) to obtain syntactic information, which increases the complexity of the model. Therefore, an event detection method without trigger words incorporating syntactic information was proposed. After converting the dependent parent word and its context into a position marker vector, the word embedding of dependent sub-word was incorporated at the source end of the model in a parameter-free manner to strengthen the semantic representation of the context, without the need of GCN for encoding. In addition, for the time-consuming and laborious labeling of trigger words, a type perceptron based on the multi-head attention mechanism was designed, which was able to model the potential trigger words in the sentence to complete the event detection without trigger words. In order to verify the performance of the proposed method, experiments were conducted on the ACE2005 dataset and the low-resource Vietnamese dataset. Compared with the Event Detection Using Graph Transformer Network (GTN-ED) method, the F1-score of the proposed method was increased by 3.7% on the ACE2005 dataset; compared with the binary classification method Type-aware Bias Neural Network with Attention Mechanisms (TBNNAM), the F1-score of the proposed method was increased by 9% on the Vietnamese dataset. The results show that the integration of syntactic information into Transformer can effectively connect the scattered event information in the sentence to improve the accuracy of event detection.

Table and Figures | Reference | Related Articles | Metrics